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Abstract 

Ph 

1} We give efficient data-oblivious algorithms for several fundamental geometric problems that are 

relevant to geographic information systems, including planar convex hulls and all-nearest neighbors. 
Our methods are "data-oblivious" in that they don't perform any data-dependent operations, with the 
exception of operations performed inside low-level blackbox circuits having a constant number of inputs 

Oand outputs. Thus, an adversary who observes the control flow of one of our algorithms, but who cannot 
see the inputs and outputs to the blackbox circuits, cannot learn anything about the input or output, 
u This behavior makes our methods applicable to secure multiparty computation (SMC) protocols for 

c/3 geographic data used in location-based services. In SMC protocols, multiple parties wish to perform a 

, ^ , computation on their combined data without revealing individual data to the other parties. For instance, 

our methods can be used to solve a problem posed by Du and Atallah, where Alice has a set, A, of m 
private points in the plane, Bob has another set, B, of n private points in the plane, and Alice and Bob 
want to jointly compute the convex hull of A U B without disclosing any more information than what can 
be derived from the answer. In particular, neither Alice nor Bob want to reveal any of their respective 
points that are in the interior of the convex hull of A U B. 

Keywords: data-oblivious algorithms, convex hulls, compressed quadtrees, closest pairs, all nearest 
neighbors, well-separated pairs decomposition, secure multi-party computations. 

o 

1 Introduction 

> 

^ As handheld devices containing GPS receivers have become more popular, so have location-based services 

using them. In particular, the emergence of location-based mobile social networking services, such as 
GyPSii, Pelago, Loopt and Google Latitude, is revolutionizing social networking. In these applications, the 
location of a handheld device is a critical component of a social-networking computation, and sometimes is 
even the sole attribute of interest with respect to input from the user, such as for real-time traffic or real-time 
friend location. (See Figure[T]) 

Nevertheless, an individual's physical location is often considered private information and revealing it 
to networked applications poses serious privacy and security risks. For example, an employee might want to 
conceal from her employer that she is interviewing with a rival and a husband might want to conceal from 
his wife where he is shopping for her birthday present, not to mention the privacy concerns associated with 
trips to a hospital, police station, or court. Even just revealing that one is not at home could be a risk if that 
information is discovered by thieves. Thus, although participating in social location-based services can have 
significant benefits, many users will likely be reluctant to participate without solid privacy protections. 



1 




Figure 1 : Mock-up of a GPS -based cellphone app for identifying the locations of people from two different 
organizations. (Map image is from openstreetmap.org; public-domain cellphone image is by Tibounise.) 



1.1 Secure Multi-party Computations 

One way of formalizing privacy requirements for geographic data is through secure multi-party computation 
(SMC) protocols (e.g., see [7, 15,20 pT|[35p6| ), in which two or more parties hold different subsets of a col- 
lection of data values, {x\,X2, ■ ■ ■ , x n }, and are interested in computing some function, f(xi,X2, • ■ • , x n ), 
on these values. Due to privacy concerns, none of the parties is willing to reveal specific values of his or her 
pieces of data. SMC protocols allow the parties to compute the value of / on their collective input values 
without revealing any of their specific data values (other than what can inferred from the output value of 
function /). One of the main tools for building SMC protocols is to encode the function / as a circuit and 
then simulate an evaluation of this circuit using cryptographically-masked values ||7j|35j. By unmasking 
only the output value(s), the parties can learn the value of / without revealing their own data. Unfortunately, 
from a practical standpoint, encoding entire computations as circuits can involve significant blow-ups in 
space and time. 



1.2 Data- Oblivious Algorithms 

The time and space overhead incurred by SMC protocols can be managed more efficiently, however, by 



using data-oblivious algorithms to drive SMC computations [48 1. A data-oblivious computation consists of 
a sequence of data accesses that do not depend on the input values. All functions that combine data values 
are encapsulated into black box operations, with a constant number of inputs and outputs. The control flow 
depends only on the input size, the problem being solved, and, in the case of randomized algorithms, the 
values of random variables. A classical example of an oblivious algorithm is a sorting network, an algorithm 
that sorts its data values by routing them through black box comparators that take as input pairs of values 
and produce as output the minimum and maximum of the pair. However, unlike sorting networks, which 
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are usually described as circuits of comparators, we allow data-oblivious algorithms to be structured as 
conventional sequential algorithms using random-access memory, looping, and recursion. An algorithm is 
data-oblivious, in our model, if two inputs of the same size have the same distribution of possible memory 
accesses. 

An adversary who can see all the control flow of a data-oblivious algorithm and all the memory addresses 
it accesses, but who cannot see any actual memory values or the results of any of its black box computations, 
cannot learn anything about the specific inputs. Therefore, in a SMC simulation of a data-oblivious algo- 
rithm, the masked evaluation of each black box may be performed independently, avoiding the blow-ups that 
arise from cryptographic simulation of non-constant-sized circuits. The parts of the algorithm outside of the 
black boxes may be performed directly and openly rather than being simulated, with masked data values 
taking the place of their unmasked values in the algorithm's memory. The resulting SMC algorithms will 
be considerably more efficient than an SMC simulation of a non-oblivious algorithm. Our aim, therefore, 
is to design oblivious algorithms for geometric problems to be used via SMC simulations as components of 
privacy-preserving location-based services. 



1.3 Problems of Interest 

In this paper, we study several classic geometric problems for geographic data, including the following, for 



which efficient conventional algorithms can be found in computational geometry textbooks (e.g., see 1 19 

22j|38j|4TJ|g); 



• convex hull: given n labeled points in the plane, return the labels of those on the boundary of the 
smallest convex set containing the set of points. 



• quadtree: given n points in the plane, construct a representation of the compressed point-set quadtree (44 
for this set of points. 



• closest pair: given n points in the plane, return the pair that are the closest. 

• all nearest neighbors: given n labeled points in the plane, return, for each point, the label of its nearest 
neighbor point. 

In addition, we study a more specialized problem — the construction of a well-separated pair decomposition — 
which was introduced by Callahan and Kosaraju fl4} , who also showed how it can be used to solve the all 
nearest neighbor problem in a way that generalizes an approach of Vaidya |47j. Chan 1 16 1 gives a linear-time 
algorithm for computing a well-separated pair decomposition in the case of integer-coordinates. 



1.4 Related Prior Work 



The general topic of privacy for location-based services is of considerable interest in GIS (e.g., see |11 



T2jT7}23]|30 34 1). Of all the problems listed above, the convex hull and nearest-neighbor problems are 
probably the most well-motivated for geographic data. For instance, Stojmenovic et al. (461 use convex 



hulls of nearest neighbors for greedy routing in wireless networks. Getz and Wilmers [24] use unions of 
convex hulls of nearest neighbors to construct species home ranges from GPS data. Basch, Guibas, and 
Hershberger [ 6 ] give data structures for maintaining convex hulls and closest pairs for mobile geographic 



data. Likewise, at previous ACM GIS conferences, Henrich et al. \ 29 1 use convex hulls to define geographic 
footprints for geographic database queries, Liu and Lee |33| use convex hulls to study wireless location 
using non-line-of-sight radio signals, and Buchin et al. fT3| use convex hulls to characterize similar parts 
of trajectories. In addition, there is considerable prior work on answering nearest-neighbor queries for both 
static and mobile GPS data (e.g., see |8]|28]|40j|43j). None of these prior algorithms is data-oblivious. 
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In addition to the work on SMC protocols cited above, Du and Atallah [20] survey SMC protocols and 
mention several geometric problems, including planar convex hull, as being of interest for SMC. Atallah and 
Du EJ specifically address privacy-preserving computational geometry SMC protocols, including two-party 
protocols for point-in-polygon detection, polygon intersection detection, and closest-pair finding, although 
there has been some questions regarding the correctness of some of these method^] Li and Dai [ 32] study 
several low-level primitives for privacy -preserving geometric computations and give a protocol of complex- 
ity 0(n 2 ) for computing the closest red-blue pair between a set of red points and blue points in the plane. 
Wang et al. | [49| and, independently, Wang and Zhang [ 50 ] , present SMC protocols for two-party convex hull 
construction, with quadratic communication complexity. Hans et al. [ 27 1 give an improved SMC protocol 
for convex hull construction, using a protocol with complexity 0(n log n), but their method is non-oblivious 
and reveals all the points on the convex hull, whereas our method can be used to selectively reveal only cer- 
tain types of points of interest. In addition, Li et al. pT| present a quadratic SMC protocol for approximate 
three-dimensional convex hulls. 

Goldreich and Ostrovsky |25j give a general construction for converting a non-oblivious algorithm into 
an oblivious one. Their simulation has an 0(log 3 n) blow-up in time, which results in inefficient oblivious 
algorithms if applied to existing computational geometry algorithms for the problems we address. 



1.5 Our Results 

We give data-oblivious algorithms for planar convex hull construction, well-separated pair decomposition, 
compressed quadtree construction, closest pairs, and all nearest neighbor finding in a set of n points. Our 
methods run in 0(n log n) time and, using known SMC protocols (e.g., see |[7|[T5||20j|2T][35]|36j), result in 
privacy -preserving two-party protocols for performing joint computations of these geometric algorithms on 
private data held separately by two parties, with communication complexities that are 0(n log n) times the 
complexities for the low-level SMC protocols (used to simulate our low-level blackbox computations. In 
addition, we also give oblivious algorithms for list ranking, tree contraction, and all nearest larger values, 
with similar 0(n log n) running times. 

Our optimal oblivious convex hull algorithm involves the use of a new geometric classification for 
common tangent finding for two convex polygons separated by a line, which extends the Overmars and 
van Leeuwen classification [39] to subsequences of edges of the respective polygons. Our optimal oblivious 
algorithms for all nearest neighbors, closest pairs, compressed quadtree construction, and well-separated 
pair decompositions, on the other hand, depend more on new combinatorial insights than geometric ones, in 
that our methods are based on new oblivious computations for list ranking, tree contraction, and all nearest 
larger values. 



2 Oblivious Convex Hulls 

Suppose we are given an array A of n points in the plane, sorted by their ^-coordinates (since it is possible 
to sort A obliviously in 0(n log n) time j2 26 1). The desired output is for each point p in A to be labeled 



with a pair of points, (q, r), that form the upper convex hull edge that is intersected by a vertical line through 
p. If p is on the upper convex hull, then (q, r) is the convex hull edge that follows p in the clockwise 
direction. This assumption about the output format could also be replaced, without changing the overall 
running time, by a compact listing of the upper hull vertices padded with "blank" points so that the total 
size is n (since we must maintain the data-oblivious nature of our method). This alternative output format 
could be produced, for instance, by performing a compaction operation on the uncompressed output format 
of labeling each point with its vertical upper convex hull edge. For instance, in a privacy-preserving security 



'Du, private communication. 
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two-party protocol, Alice could hold a set, A, of n blue points and Bob could hold a set, B, of n red points, 
and we could use the oblivious convex hull algorithm we describe in this section to let Alice and Bob each 
learn which of their respective points are on the convex hull of A U B. (See Figure[2]) 




Figure 2: A two-party convex hull problem. Alice holds a set, A, of blue points (triangles) and Bob holds a 
set, B, of red points (circles). Each should learn only which of their points are on the convex hull of A U B. 

2.1 Background for Our Approach 

Before we present our oblivious convex hull algorithm, we briefly mention alternative approaches that do 
not result in optimal oblivious algorithms. A standard approach to use the divide-and-conquer paradigm, by 
dividing A into its first and second halves, A\ and A2, and recursively construct the upper hulls, UH(A\) 
and UH{A2), of the points in A\ and A2, respectively. The problem that remains is to find the upper tangent 
segment between UH{A\) and UH(A2), and label all the points under this tangent segment to have this 
segment as their upper convex hull edge. So let us focus on the computation of the upper tangent, (q, r), 
between UH(A\) and UH{A2). Since the points in UH{A\) and UH{A2) are ordered by their x-coordinates, 
we can apply a binary search of Overmars and van Leeuwen [39] to find the upper hull. The main idea of 
this method is to probe at two vertices p G UH(Ai) and q £ UH(A2) and use the relative positions of the 
edges next to p and q to determine which portions of UH(Ai) and/or UH{A2) that can be safely eliminated 
as candidates for the upper tangent points q and r, respectively. The case analysis is shown in Figure [3] and 
results in a running time of 0(log n) for finding the common upper tangent. 

Broadcasting this tangent to members of UH{A\) and UH(A2) and comparing each edge to the tangent 
allows us to then produce the desired output. Unfortunately, the binary search process of Overmars and 
van Leeuwen is not oblivious. Moreover, implementing it with the oblivious RAM simulation of (251 blows 
up the running time for tangent finding from O(logn) to 0(n log 4 n), which results in a running time of 
0(n log 5 n) for oblivious convex hull construction. Nevertheless, our method borrows from this approach 
the idea of using divide-and-conquer. 

An alternative divide-and-conquer approach is suggested by the parallel convex hull algorithm of Atallah 
and Goodrich [5]. In this case, rather than perform a binary search to find the upper tangent, they perform an 
0(n 1//2 )-way search in parallel. This 0(n 1//2 )-way approach is another technique we borrow, but in a way 
quite different from that used by Atallah and Goodrich, as their method is nonoblivious in how it finds upper 
tangents. Implementing their algorithm using a simulation of a PRAM with an oblivious RAM requires 
0(n log n) time to find the upper tangent, which leads to a running time of 0(n log 2 n) for oblivious convex 
hull construction. Instead, our oblivious method is based on a novel geometric characterizations of when 
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Case a: Case b: Case c: 




Figure 3: The cases for the Overmars-van Leeuwen binary search for a common upper tangent. Each case 
shows the relative orientation of two points, which are respectively on two different upper hulls, and the 
portion(s) of each hull that can be eliminated as potential locations for the points of tangency. 

edges of subsequences of the two upper hulls come before or after the tangent line in terms of an ordering by 
decreasing slopes. This alternative approach allows us to achieve a running time of 0{n log n) for oblivious 
convex hull construction. 

2.2 Our Oblivious Convex Hull Method 

Given a set of points, A, ordered by their ^-coordinates, we define the format of the upper hull, UH(A), of 
A, to be as follows For each point p in A, we label p with the edge, e(p), of the upper convex hull that is 
intersected by a vertical line through the point p. If p is itself on the upper hull, then we label p with the 
upper hull edge incident to p on the right. To simplify the description of our algorithm, we assume no two 
points in A share the same x-coordinate. 

Our method is as follows. Divide A into its first and second halves, A\ and A2, by a vertical line V and 
recursively construct the upper hulls, UH{A\) and UH{A2), of the points in A\ and A2, respectively, with 
representations as described above. In addition, we assume, without loss of generality, that UH(A\) and 
UH{A2) are each augmented with vertical dummy edges incident on the first and last vertices in UH(Ai) 
and UH{A2) respectively. The problem that remains is to find the upper tangent segment between UH(A\) 
and UH(A>z), and label all the points under this tangent segment to have this segment as their upper convex 
hull edge. So let us focus on the computation of the upper tangent, (q, r), between UH(Ai) and UH{A2). 

We aim to assign each edge e of UH{A\) and UH^Az) one of two labels: 

• L: the tangent line of UH(A% U A2) with the same slope as e is tangent to UH(A\). 

• R: the tangent line of UH{A\ U A%) with the same slope as e is tangent to UH^Az). 

In some intermediate steps, however, we may be unable to determine yet whether an edge should be labeled 
L or R; In such cases, we temporarily label it with an X. 

If an edge of UH{A\) gets label L, then it is part of UH{A\ U A2). If it gets instead label R, then we 
know it is not part of UH(A\ U^). Similar considerations hold for edges of UH{A2) and their labels. Thus, 
if we can label each edge in UH{A\) as L or R, with no edges labeled X, then we can immediately identify 
the vertex of tangency on UH{A\) — it is the vertex incident on the two edges respectively labeled L and R. 
All edges before this point will be labeled L and all edges after this point will be labeled R. Likewise, a 
similar property holds for UH^A^). To aid in our characterization of the edges of UH(A\) and UH(A2), we 
have the following. 
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Lemma 1: Let Hi and H 2 be (possibly disconnected) subsequences of the edges of UH{A\) and UH^Ai), 
respectively, ordered by decreasing slopes and both containing the dummy vertical edges from UH{A\) and 
UH(A2) as their respective first and last edges. For a non-vertical edge e in H\ (resp., H2), let d be the 
edge in H2 (resp., Hi) with smallest slope greater than e and let f be the edge in H2 (resp., H\) with largest 
slope less than e, noting that d and f are not necessarily consecutive edges in UH^Ai) (resp., UH{A\)). 
Then there is a simple comparison rule involving only d, e, and f, with the result being one of the following 
outcomes: 

• e is correctly labeled L or R. 

• e is labeled X, but d is correctly labeled L and f is correctly labeled R. 

Proof: Without loss of generality, let us assume e is in Hi and d and / are in H2, and all three edges 
have distinct slopes. Also, let V be the vertical line separating A\ and A2, and let lie), £(d), and £(f) 
denote the lines containing e, d, and /, respectively. We distinguish four cases, with respect to the points 
a = £(e) n £(d) and b = £(e) n £(f), as to whether (i) a and b are both to the left of V, (ii) a and b are 
both to the right of V, (iii) a is to the left of V and b is to the right of V, or (iv) a is to the right of V and 
b is to the left of V. Note first that case (i) is impossible, since it would require the portion of 1(d) to the 
right of V be completely above £(f) to the right of V (since d has to be below £(f) to the right of V). For 
cases (ii), (iii), and (iv), we illustrate the possibilities in Fig. [4] For each instance, if there is more than one 
possible applicable case according to the Overmars-van Leeuwen classification (OvL cases a through i2), 
we choose the one that is the most pessimistic with respect to how much we can determine about the labels 
of the respective edges. In Case (ii), e is labeled R or d is labeled L and / is labeled R. In Case (iii), e is 
labeled R. Finally, in Case (iv), e is labeled L. □ 

Let N = \y/n\ , and let A' x and A' 2 respectively denote the subarrays of A± and A2 consisting of the 
points with indices at multiples of N. Let Hi be the subsequence of the recursively constructed upper hull 
UH(Ai) consisting of the edges that have at least one endpoint vertex in A\. Define H2 similarly with 
respect to UH(A2) and A' 2 . Thus, Hi and H2 have size 0(n 1,/2 ). We perform a round in our computation 
as follows: 

1. We perform an (oblivious) brute-force computation to compare every pair of edges in Hi U H2, so 
as to determine for each non-vertical edge e in Hi (resp., H 2 ), the edge, d, in H2 (resp., Hi) with 
smallest slope greater than e and /, the edge in H 2 (resp., Hi) with largest slope less than e. 

2. For each edge e, use Lemma[T]to label e with L, R, or X, as a blackbox computation applied to each 
edge, with its associated edges d and /. 

3. Perform another brute-force comparison of every pair of edges in Hi U H2 to label edges d and /, as 
L and R, for some edge e whose blackbox computation determined these labels for d and /. 

4. Perform a forward scan and reverse scan on Hi and H2 using a blackbox computation that labels any 
edge to the left of an L-labeled edge as L and any edge to the right of an i?-labeled edge as R. 

Note that all of the above steps can be performed obliviously in 0(n) time. 

Lemma 2: After the above round computation completes, at most one of the subsequences Hi or H2 can 
contain edges labeled X. 

Proof: After each application of Lemma [T] to an edge e in Hi or H2, we either label e with an L or R or we 
label e with an X and its associated edges d and f as L and R. Note that in the latter case the forward and 
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reverse scans will then completely label all the edges of the other list (not containing e) as L or R; hence, 
this list will contain no edges labeled X. That is, if we label any edge e as X, then we label all the edges in 
the other list as L or R. If, on the other hand, we don't label any edge e in H\ (resp., H2) as X, then there 
are clearly no edges in H\ (resp., H2) that are labeled as X. □ 

Although at most one of H\ or H2 can have an edge labeled X, the edges in this list may almost 
all be labeled X. Thus the above round computation will reduce the candidate tangent vertices in the 
representation of one of UH(Ai) or UH{A2) (but not necessarily both) to a subregion of size 0{n x l 2 \ If 
it reduces UH{A\), call it a red-1 reduction and if it reduces UH{A2), call it a blue-1 reduction. A second 
application of the round computation will either reduce the other list to a subregion of size Ofo 1 / 2 ) (i.e., 
it will be a red-1 or blue-1 reduction) or it will reduce the first subregion to a single vertex of tangency, 
which we call a red-2 or blue-2 operation, depending on whether it occurs to UH{A\) or UH(A2). In either 
case, two more applications of the round computation will determine the tangent edge between UH(A\) and 
UH(A 2 ). 

As described above this sequence of applications of the round computation is nonoblivious, but it can be 
made oblivious by considering all valid sequences of red-1, blue-1, red-2, and blue-2, in turn. One of these 
constant number of operation sequences will be the correct sequence to find the upper tangent. By trying 
all these possibilities obliviously (with conditional no-ops for paths not taken) we will perform the one the 
leads to the determination of the tangent between UH(Ai) and UH(A2). 

An important implementation detail is the oblivious method for doing the reduction in a red-1 or blue-1 
operation. A red-2 or blue-2 operation, which reduces a set of size 0(n 1//2 ) to an object of size O(l) can 
be done obliviously in a single scan using a constant-size register. In the red-1 or blue-1 operation, we have 




Figure 4: The possible cases for the configurations of £(e), £(d), and £(f). In some of the cases there are two 
possible locations for e, d, or / relative to the lines containing these edges, in which case we use subscripts 
1 or 2 to distinguish the two relative locations. For each scenario, we list a set of comparisons and their 
results according to the OvL classification, together with the resulting classification of e, d, and/or /. Note 
that in Case (ii) of particular note is the (e2, /) comparison in Case (iv), for this is an example of an OvL 
Case h where we can classify e2 as L, since it is impossible for any edges of UH{A2) to be above £(e) in 
this scenario. 
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an array A of size 0(n) for which we want to isolate a subregion of size 0(n 1 / 2 ) based on the labels of its 
boundary elements, and copy it into a buffer, B, of size 0{n 1 / 2 ). For each subregion, we read the boundary 
elements and use them to set a register flag, F, that determines whether this is the region that should be 
copied. We then read the i-th element, B[i], from our buffer, perform a conditional swap (based on F) with 
the i-th element in this region of A, and write the result back to B[i]. Thus, in an oblivious way, we can 
copy a subregion of interest into the buffer B, with the total computation taking 0(n) time. 

Summarizing, we can perform the determination of the upper tangent of UH(A\) and UH{A2) oblivi- 
ously in 0(n) time, including the scan of UH{A\) concatenated with UH{A2) to relabel any vertices under 
this tangent with an identifier for this tangent. A similar construction applies to the lower hull of A. Thus, 
we have the following result. 

Theorem 3: Given a set S of n points in the plane, we can obliviously construct a representation of the 
convex hull of S in 0(n log n) time. 

Using Theorem [3} we can then apply standard cryptographic circuit simulation methods to derive a 
secure multiparty computation involving private data (e.g., see (7] 15] 20 21 35 36 1). Hence, we obtain a 



secure two-party protocol for Alice and Bob to determine which of their respective points belong to the 
convex hull of the union of their n points with a communication complexity of 0(n log n). 

Corollary 4: There is a secure two-party protocol that computes the convex hull of the union of two private 
sets of points of total size n with 0(n log n) communication complexity. 



3 Some Combinatorial Problems 

We next turn to oblivious algorithms for some combinatorial problems that crop up in our methods for our 
other geometric algorithms. 

3.1 All Nearest Larger Values 

In the All Nearest Larger Values (ANLV) problem [9], we are given an array A of n numbers, such that, for 
each value, A[i], we want to determine the values A[j] and A[k], where j is the largest index less than i with 
A[j] > A[i] and k is the smallest index greater than i with A [k] > A[i\. As observed by Berkman et al. (9), 
this problem is actually a generalization of the problem of merging two sorted lists, C and D, since these 
lists can be merged by solving an ANLV problem for an array that consists of a reversal of C followed by 
D. Our oblivious method for solving the ANLV problem, where we assume without loss of generality that 
the values are distinct, is different from that of Berkman et al. and is as follows. 

1. Build a complete binary tree, T, "on top" of the items in A and perform a bottom-up tournament 
computation to compute, for each v in T, the value, M(v), which is the maximum value stored in a 
descendent of v in T. This is a straightforward oblivious computation that takes 0(n) time. 

2. For each leaf x in T, let l(x) denote the lowest node in T such that l(x) is a left sibling of an ancestor 
of x in T and M(l(x)) > A[x], where A[x] is the value in A associated with x. Likewise, let 
r(x) be denote the lowest node in T such that r(x) is a right sibling of an ancestor of x in T and 
M(r(x)) > A[x], where A[x] is the value in A associated with x. We compute l(x) and r(x), which 
are initially null for each x, in a divide-and-conquer computation, with respect to a node v in T. In 
this computation, we recursively compute the I and r labels for nodes in the subtrees rooted at u's 
left and right children, u and w, producing lists, D(u) and D{w), of labeled descendents of u and w. 
Then, we scan D(u) to assign r(x) = w for each x such that A[x] < M(w) and r(x) was previously 
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null. Also, we scan D(w) list to assign l(x) = u for each x such that A[x] < M(u) and l(x) was 
previously null. We then concatenate these two lists of labeled nodes (some of which are still null) to 
create the list D(v) for v, which is passed up to v's parent. This step runs in 0(n log n) time. 

3. For each node v in T with left child u and right child w, we perform a scan of D(u) to find the 
smallest value A[x] such that A[x] > M{w), if it exists, and we scan back through D(w) to label the 
value A[y] in w's list such that A[y] = M(w) to show that A[x] (if it exists) is the nearest larger value 
to the left of A[y\. Likewise, we perform a scan of D(w) to find the smallest value A[x] such that 
A[x] > M(u), if it exists, and we scan back through D(u) to label the value A[y] in u's list such that 
A[y] = M(u) to show that A[x] (if it exists) is the nearest larger value to the right of A[y}. (These 
scans are used to take care of the boundary values for each node.) Finally, when we are done with all 
the scans, we perform a sorting step to report back to each node the labels that have been found for it. 
This step runs in 0(n log n) time. 

4. For each element x stored in a leaf of the binary tree T, let us create two tuples, (l(x) ,A[x], "right", —i, L, 
and (r(x), A[x], "left", i, L, R), where i is the index of the value A[x] in A, and L and R are the left 
and right ANLV's for A[x] (most of which are probably null at this point). Perform an oblivious sort 

of all these tuples, using a lexicographic ordering rule, to produce the sorted list, B, of such tuples. 
This step takes 0(n log n) time. 

5. Scan the list B in reverse order. During this scan we maintain three registers, v, L and R. The register 
v is a label of the current node, v, for which we are computing ANLV's for, that is, the first coordinate 
of the tuples we are scanning. The scan for each v is essentially a merge of its left and right children's 
lists of nodes whose ANLV is determined by this merge at v. The register L is maintained to be the 
smallest "left" value in a tuple with this v as its first (r(x)) coordinate. The register R is maintained to 
be the smallest "right" value in a tuple with this v as its first (l(x)) coordinate. Whenever we encounter 
a tuple, if it is a "right" tuple, we identify its left ANLV as L, and if it is a "left" tuple, we identify its 
right ANLV as R, assuming we have not already determined this value previously (which coincides 
with the point where we reset the register v). This scan can be done obliviously in 0(n) time. 

6. Perform one more sort to bring together the computed left and right ANLV's for each node x in T. 
This step can be done obliviously in 0(n log n) time, and it completes the algorithm. 

Thus, we have the following. 

Theorem 5: Given an array A ofn values, we can obliviously solve the ANLV problem for A in 0(n log n) 
time. 

3.2 List Ranking 

In the list ranking problem (3]fl8j, we are given a linked list, L, stored in the records of an array of size 
n, for which we want to compute, for each node v the number of nodes from v to the end of the list, using 
pointer hopping as the distance metric. 

Theorem 6: Given a linked list Lofn nodes, we can obliviously perform a list ranking of the nodes in L, 
using a computation that always runs in 0(n log n) time and succeeds with high probability. 

Proof: To solve the list ranking problem on a list L of n nodes obliviously, we perform the following 
actions: 
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1. Create for each node v in L a. field, d(v), which stores an indication of the distance from v to the end 
of L. Initially, each d[v) = 1. 

2. Generate a random bit, b(v), for every node v in L. 

3. Perform two oblivious sorts so as to "link out" each node w with b(w) = that follows a node v with 
b(v) = 1, storing with w a reference to name of the node, u, that currently follows w in L. In addition, 
with this link out step, we update d(v) = d(v) + d(w). 

4. Repeat the previous two steps a constant, c, number of times, until it is likely, with high probability, 
that the connected part of L has at most half as many nodes. 

5. If \L\ > n/logn, perform an oblivious sorting step and compression to reduce the working storage 
used for L to be half its previous size. Then repeat the above computation starting with Step 2. 

6. If \L\ < n/logn, then perform O(logn) link-out steps, where we apply a link-out operation, like 
the one above, for every node in parallel, using oblivious sorts to perform the actions in an oblivious 
fashion. The total running time for all these actions is 0(((n/ log n) log n) log n) = 0(n log n). 

7. Reverse the above link-out steps, in reverse order, so that, for each node w that was linked out in 
step i, we perform two oblivious sorting steps to communicate the information needed so that we can 
update d{w) = d(w) + d[u), where u was the node that followed w when it was linked out. 

Since we reduce the number of nodes, and the working storage for L, by half every c steps, with high 
probability, and we then reverse these actions to finally solve the list ranking problem, we get that the run- 
ning time of this method is a geometric sum that is O(nlogn). Moreover, since we terminate the halving 
process and switch to a parallel link-out process when \L\ < n/logn, we get that this method succeeds in 
computing a list ranking for L with high probability. □ 

3.3 Tree Contraction 

In a tree contraction |T][37]|42j computation, we are given a proper binary tree T such that each leaf node is 
associated with a value and each internal node is associated with an arithmetic operation to be computed on 
its two children. The goal is to efficiently compute the value of each node in T in an oblivious fashion, even 
if the height of T is 0(n). 

Theorem 7: Given a binary arithmetic tree, T, with n nodes, we can obliviously compute the value of each 
internal node ofT in 0(n log n) time, in a computation that succeeds with high probability. 

Proof: Adapting a parallel algorithm of Abrahamson et al. |TJ, we can solve the tree contraction problem 
obliviously as follows. 

1 . Perform a list ranking operation to number the leaves of T from 1 to N. Using the algorithm described 
below, this step takes 0(n log n) time and succeeds with high probability. 

2. For each node v in T that is an odd-numbered leaf and a left child of its parent, link out v and its 
parent, making u's sibling to be the new child of v's grandparent. In doing this operation, record 
for v and its parent the iteration it is removed and the names of the grandparent and sibling nodes at 
this point. In addition, in the link-out operation, we label the child-parent edge with an 0(l)-sized 
algebraic operation to apply in going from the child value to the parent (which is composable when 
we combine previously-computed edges in a link-out). This step can performed obliviously using 
0(1) oblivious sorting steps. 
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3. For each node v in T that is an odd-numbered leaf and a right child of its parent, link out v and its 
parent, making u's sibling to be the new child of u's grandparent. In doing this operation, record for 
v and its parent the iteration it is removed and the names of the grandparent and sibling nodes at this 
point, along with any edge updates as in the previous step. This step can performed obliviously using 
0(1) oblivious sorting steps. 

4. If \T\ > 1, divide the leaf number of each leaf node by 2 and repeat the above two steps. 

5. Reverse the above actions to compute the value of each internal node. 
This completes the proof. □ 



4 Quadtrees and Well-Separated Pair Decompositions 

Having described our methods for some fundamental combinatorial problems, we describe in this section 
our oblivious algorithms for constructing a compressed quadtree and for forming a well-separated pair de- 
composition. 

4.1 Constructing a Compressed Quadtree 

A compressed quadtree (e.g., see [10 , 16,44 ]), for a set S of n points in the plane, normalized to have the unit 



square, [0, 1] x [0, 1], as a bounding box, is defined as follows. A quadtree (e.g., see |44|) for S is defined 
recursively, where we create a node v for the current bounding box and, if this bounding box has more than 
a given threshold number of points of S, then we divide this box into four equal-sized boxes as quadrants, 
and we recursively construct subtrees for each non-empty quadrant, with the nodes for these non-empty 
quadrants having v as their parent. (See Figure[5J) If we then compress any chains of nodes in this quadtree 
that have only one child, then we get the compressed quadtree. This definition is clearly not something that 
leads to an oblivious construction algorithm, of course, but we can in fact construct a compressed quadtree 
for S obliviously in 0(n log n) time. 

An alternative method for constructing a compressed quadtree, as observed by several researchers (e.g., 
see fT0"l[T6 44]), is based on a sorting of the points of S according to the interleaving order. In the interleav- 



ing order, we take each point (x, y) and interleave the bits for x with the bits for y is a standard shuffling, 
and we compare points according to this order. This order can also be interpreted geometrically fl6| for the 
sake of a comparison-based sorting algorithm. Once we have the points of S stored in an array A according 
to the interleaving order, we note, as shown by Bern et al. [ 10], that the nodes contained in any compressed 
quadtree box form a contiguous subsequence in A. Moreover, we can label each transition between two ad- 
jacent points in A with the box that is formed along that transition, and we can then identify the compressed 
quadtree box that contains each point p in A by performing an ANLV computation, where we use box size 
to determine values in this ANLV computation. Given this information, we can perform a postprocessing 
step consisting of two oblivious sorting steps to determine the adjacency information between the parent 
and child nodes in the compressed quadtree. Thus, we have the following. 

Theorem 8: Given a set S of distinct points in the plane, we can obliviously construct a compressed 
quadtree for S in 0(n log n) time. 
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Figure 5: A (region) quadtree for a set of points. (Public-domain image by David Eppstein.) 
4.2 Well- Separated Pair Decomposition 

Another important geometric computation is the construction of a well-separated pair decomposition (WSPD) 
for a set S of n points in the plane. In a WSPD p4| , we are given a parameter s for which we want to con- 
struct a set of pairs, (A\,Bi), (A2, B2), • • ■ , (Ak, Bk), such that every pair of points p and q are represented 
by a pair (Ai,Bi) such that p G Ai and g G Sj, or vice versa, and such that there are balls of radius r 
containing A{ and Bi respectively so that these balls are of distance at least sr apart. In our applications we 
choose s > 2 to be a constant, e.g., s = 2.1 will do. 

We should note that some authors also insist that each pair of points p and q be represented exactly 
once in some (Aj, BA pair in WSPD. But duplicate representation is actually not a problem for most WSPD 
applications, including the ones we consider, so we don't make this additional requirement. What is essential 
in our definition is that the total number of pairs in a WSPD be linear. 

Given a compressed quadtree for a set S of n points in the plane, Chan p6| , shows that a WSPD can 
be constructed in 0(n) time by a simple (non-oblivious) recursive search algorithm defined on the nodes 
of the compressed quadtree. Using a technique of Callahan and Kosaraju, Chan shows that the time and 
combinatorial complexity for his algorithm is 0(n) by using a packing argument, which shows that the 
number of compressed quadtree boxes that are no smaller than a box b but are too close to be candidates for 
a well-separated pair with b is bounded by a constant depending on s. 

We define an alternative, oblivious construction algorithm for a well-separated pair decomposition by 
turning this construction and argument "on its head." That is, we use the packing argument itself to construct 
the WSPD. In particular, for each box B in the compressed quadtree, T, there are 0{s 2 ) = 0(1) boxes in 
the uncompressed quadtree, T', that are the same size as B and are not well-separated from B. And for 
each such box, B', there are O(l) immediate (children and grandchildren) descendents of the edge in T 
corresponding to where B' is located in T'. These immediate descendents and the children of B in T 
together form candidates for well-separated pairs. And the collection of all such sets of candidate pairs form 
a superset of the pairs that are considered by the WSPD algorithm of Chan [ 16). Thus, if we can consider 
all such pairs and only keep the ones that form well-separated pairs, then we can construct a WSPD of 
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size 0{n). 

The challenge is to collect all such pairs. We do this as follows. 

1. For each box B in the compressed quadtree, T, form the set, B{B) of 0(s 2 ) = O(l) boxes that are 
the same size as B in the uncompressed quadtree and are not well-separated from B. 

2. In parallel, for each B in T, pick a box B' in B(B) that was previously unconsidered, and create 
two points pb and qs inside B' that do not fall inside the same child box of B' in the uncompressed 
quadtree, T'. Call these points "dummy points." 

3. Create a compressed quadtree, T, for all the points in S together with the dummy points created in 
the previous step. Note that the box B' is in T, even if it is not in T. 

4. Label each point in S with a 1 and each dummy point with a and perform an tree compression on 
T, where each internal-node operation is a binary OR, to determine the binary value of each internal 
node in T. Note that the nodes of T that also exist in T will have at least two children that have binary 
values equal to 1. 

5. Remove all the nodes with binary values equal to from T and construct an Euler tour of its edges, 
perform a list ranking in that Euler tour, and then an ANLV computation on the nodes in this list 
using node degree as the item values. This computation gives us, for each box B' in a B(B) set, the 
highest nodes in T whose boxes are contained in B'. These nodes and their children, together with the 
children of B, form candidates for well-separated pairs. Identify which ones are indeed well-separated 
and compress them into a list of answers produced in this round. 

6. Repeat Steps 2 through 5 above until we have considered each box in a set B{B), for its box B. 

Each of the above steps runs in 0(n log n) time, with the list ranking and tree evaluation steps suc- 
ceeding with high probability. Likewise, there are only 0(1) iterations to this algorithm. So we get the 
following. 

Theorem 9: Given a set Sofn points in the plane, and a compressed quadtree T for S, we can construct 
a well-separated pair decomposition for S, with each set being associated with a node in T, in 0(n log re) 
time with an oblivious computation that succeeds with high probability. 

5 Closest Pairs and All Nearest Neighbors 

Having presented all the above algorithmic techniques, we are now ready to describe our oblivious algorithm 
for solving the all nearest neighbors problem. 

So, let us assume we are given a set S of n points in the plane for which we want to solve the all nearest 
neighbors problem. At a high level it is an oblivious adaptation of a parallel all nearest neighbors algorithm 
of Callahan and Kosaraju fT4"J . 

1. Construct a compressed quadtree T for the points of S, using the method of Theorem [8] 

2. Construct a well-separated pairs decomposition (WSPD), based on T, using the method of Theorem[9] 

3. Discard each pair (A, B) in the WSPD if neither A nor B is a singleton set. (Note: if all we want is 
a closest pair, then we can skip the remaining steps and find the closest of all the singleton-singleton 
pairs in the WSPD.) 
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4. For each box B in the WSPD for which there is at least one remaining pair, ({a}, B), construct the set 
N(B) of all such points, a. We represent this information obliviously as a collection of pairs (a, B) 
where a is a point and B is a box, padded with null items. 

5. For each box B, partition the plane into a set of 0(1) wedges having the center o of B as their 
apex, and prune N(B) to contain only the closest point to o within each wedge (by replacing the 
pairs representing other points by null items), with the number of wedges chosen according to the 
parameters of the WSPD so that for each pruned point there is another point of N{B) that is closer to 
it than every point in B is. The set of remaining points in each set N(B) will have size O(l). 



6. Using a tree contraction algorithm of Callahan and Kosaraju [ 14 ] , construct for each leaf v of T, which 
is associated with a point b, the set N'(b), which is the set of all points a such that (a, B) is in the 
WSPD for an ancestor of v in T and such that a's distance to b is no larger than the minimum distance 
from a to other points in N(B). In other words, N'(b) consists of all those points of S that could have 
b as a nearest neighbor. This step takes 0(n log n) time to implement obliviously, by Theorem[7] 

7. For each point a in a set N'(b), construct the pair (a, b). Sort all these pairs so as to bring together, 
for each point a, those points that could be a nearest neighbor to a. Then perform a scan of this list to 
determine, for each a, its nearest neighbor. This step takes 0(n log n) time. 

Each of the above steps can be implemented in 0(n log n) time, either because of the specific results from 
the referenced theorems, or because the step is easily performed obliviously by making a constant number 
of calls to an oblivious sorting routine. 

Theorem 10: Given a set S of n points in the plane, we can compute the nearest neighbor in S for each 
point in S and find a closest pair of points in S with an oblivious computation running in 0(n log n) time. 



Starting from the data-oblivious algorithms of Theorem 10 we can then apply standard cryptographic 



circuit simulation methods to derive a secure multiparty computation involving private data (e.g., see (7 



20 21 35 36]). Hence, we obtain secure two-party protocol for Alice and Bob to compute either the closest 
pair or the nearest neighbor in the union of their n points for each of their respective points, but otherwise 
learn nothing about the other person's points. 

Corollary 11: There is a secure two-party protocol that computes the all nearest neighbors and a closest 
pair in the union of two private sets of points of total size n with 0(n log n) communication complexity. 



The result of Corollary 1 1 is perhaps counter-intuitive, in that one might, at first, believe that such a 
computation reveals all of the points in question. However, if Alice and Bob's respective sets of points 
are relatively well-separated, then each of them would learn almost nothing from a two-party all nearest 
neighbors computation, for, in this case, each of their respective points has a nearest neighbor in its same 
original set. 



6 Conclusion 

We have given efficient oblivious algorithms for a number of geometric problems, which are natural prob- 
lems to arise in privacy-preserving protocols for computing functions of points that are derived from the 
coordinates of actors in various location-based services. We have also given oblivious algorithms for several 
fundamental combinatorial problems. There are a host of open problems, however, that might be of interest 
in privacy preserving computations, including the following: 
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• Given a set S of n vertical and horizontal line segments, can one obliviously compute in 0(n log n) 
time the number of pairs of segments in S that intersect? 

• Given a set S of n points in the plane, can one construct a representation of the Delaunay triangulation 
of S obliviously in 0(n log n) time? 

• Given a set S of n points in R 3 , can one construct a representation of the convex hull of S obliviously 
in 0(n log n) time? 

• Given a simple polygon P of size n, can one construct a representation of a triangulation of P oblivi- 
ously in 0(n log n) time? If so, is this the fastest time possible for an oblivious algorithm? 
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